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o 
o 

(N 
•*-> 
O 

O 

m 



43 

6 



(N 
> 

On 

in 



Eyal Neistein and Avishai Dekel 

Racah Institute of Physics, The Hebrew University, Jerusalem, Israel 
e-mails: eyal_n@phys.huji.ac.il; dekel@phys.huji.ac.il 



ABSTRACT 

We present a simple and efficient empirical algorithm for constructing dark-matter halo 
merger trees that reproduce the distribution of trees in the Millennium cosmological N- 
body simulation. The generated trees are significantly better than EPS trees. The algorithm 
is Markovian, and it therefore fails to reproduce the non-Markov features of trees across short 
time steps, except for an accurate fit to the evolution of the average main progenitor. How- 
ever, it properly recovers the full main progenitor distribution and the joint distributions of 
all the progenitors over long-enough time steps, Aw ~ Az > 0.5, where w ~ 1 .69 / D(t) is 
the self-similar time variable and D(t) refers to the linear growth of density fluctuations. We 
find that the main progenitor distribution is log-normal in the variable a 2 (M), the variance of 
linear density fluctuations in a sphere encompassing mass M. The secondary progenitors are 
successfully drawn one by one from the remaining mass using a similar distribution function. 
These empirical findings may be clues to the underlying physics of merger-tree statistics. As 
a byproduct, we provide useful, accurate analytic time-invariant approximations for the main 
progenitor accretion history and for halo merger rates. 

Key words: cosmology: theory — dark matter — galaxies: haloes — galaxies: formation — 
gravitation 
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1 INTRODUCTION 

Dark-matter (DM) haloes are the building blocks of non-linear 
structure in the universe. They are the spheroidal gravitating sys- 
tems in virial equilibrium within which galaxies form and live. The 
spherical collapse model in a cosmological background implies that 
the outer, virial radius of a halo can be defined by the radius encom- 
passing a mean overdensity of A ~ 200 compared to the universal 
mean. The haloes are assumed to assemble hierarchically bottom- 
up, starting from Gaussian random initial density fluctuations that 
grow by gravitational instability and eventually detach from the ex- 
panding background, collapse and virilize. Most of the growth of 
a halo can be viewed as a sequence of mergers of haloes above an 
arbitrary minimum mass, termed "progenitors", with the rest of the 
assembled mass considered "smooth accretion". The merger trees, 
describing the whole merger histories of DM haloes, serve as the 
backbone of galaxy formation. 

The statistics of the halo distribution can be approximated by 
the Press-Schechter formalism (Press & Schechter 1974). Its exten- 
sion (EPS, Bond et al. 1991; Lacey & Cole 1993) provides an ap- 
proximate description of the statistics of merger trees as a stochastic 
process in which the probability for the set of progenitors is given. 
Both Press-Schechter & EPS are based on the initial fluctuation 
power spectrum combined with the analytic model of cosmologi- 
cal spherical collapse. The EPS formalism is widely used in studies 
of structure formation (e.g. Lacey & Cole 1994; Mo & White 1996; 
Hemquist & Springel 2003), especially through algorithms for the 
construction of random realizations of merger-trees (Kauffmann & 



White 1993; Sheth & Lemson 1999; Somerville & Kolatt 1999; 
Cole et al. 2000). These algorithms enable detailed "semi-analytic" 
simulations of galaxy formation models, as they are fast and allow 
a broad range of halo masses. 

While the EPS trees are useful for semi-quantitative studies, 
their accuracy may be insufficient for detailed comparisons with 
observations. When compared to merger trees extracted from N- 
body simulations, the EPS trees show non-negligible deviations, 
e.g., in the number of progenitors (Sheth & Tormen 2002), the 
growth history of the main progenitor (Wechsler et al. 2002) and 
the mass contained within all the progenitors (Neistein et al. 2006). 
In addition, the EPS theory does not uniquely define the full joint 
distribution of progenitor masses and the associated merger rates 
(e.g. Somerville et al. 2000). Therefore, different EPS-based algo- 
rithms may lead to trees with different statistical characteristics and 
predict different merger rates. 

In terms of accuracy, TV-body simulations should generate 
"true" merger trees. With the availability of large-volume simula- 
tions such as the Millennium Run (Springel et al. 2005), cosmic 
variance is no longer an issue. As a result, haloes in the mass range 
that is relevant for galaxy formation are well sampled. The accuracy 
of the DM dynamics is limited only by the numerical resolution of 
particle mass and gravitational force. However, non-trivial difficul- 
ties are involved in the process of identifying haloes (Davis et al. 
1985; Bullock et al. 2001; Springel et al. 2001) and in linking them 
to their earlier progenitors (e.g. Springel et al. 2005; Harker et al. 
2006). The resultant trees may depend on the algorithms adopted 
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for these tasks, which could be quite arbitrary. The freedom in 
defining iV-body merger trees partly reflects uncertainties in the 
adopted halo definition and its possible variation as a function of 
time or mass. Several authors define the virial radius based on a 
mean overdensity A(z) that varies in time following the spherical 
top-hat model, while others use the radius R200 based on a fixed 
A = 200 (e.g. Cole & Lacey 1996; Wechsler et al. 2002; Cohn & 
White 2007). In fact, the whole concept of a "virial radius" is put 
to some doubt by the finding that the virial kinematics extends far 
beyond the conventional halo radii around haloes that are signifi- 
cantly smaller than the non-linear clustering scale M* (Prada et al. 
2006). 

A merger tree is a Markov chain if for any halo of a given mass 
M at time t, the probability distribution of progenitors at any other 
time is fully determined by M and t. In particular, in a Markov tree, 
the history of each halo within the tree does not depend on its future 
properties. Markov trees are therefore easy to handle. Using self- 
similar time variable, the tree can be fully constructed using fixed 
probabilities of progenitor masses across small time steps. Since 
the history of a halo in a Markov tree is independent of its future, 
the halo properties do not depend on the large-scale environment. 

EPS trees are Markovian if the haloes are defined by a con- 
volution with a top-hat window in Fourier space, but any other 
window introduces correlations between the fluctuations on differ- 
ent scales, which lead to non-Markov trees. This was first formu- 
lated by Bond et al. (1991), and implemented, e.g., by Amosov 
& Schuecker (2004); Zentner (2007). Indeed, iV-body trees are in 
general non-Markovian, making their statistical description more 
complicated. This is evident from the detection of environment 
dependence in halo histories that are extracted from cosmologi- 
cal simulations (Gao et al. 2005; Harker et al. 2006). These non- 
Markov features may arise from the correlations introduced by the 
smoothing of the initial density field, from the finite relaxation time 
associated with the non-linear assembly process, and from tidal ef- 
fects including tidal stripping of haloes as they move near or inside 
other haloes (Diemand et al. 2007; Desjacques & Dekel 2007; Hahn 
et al. 2007). We note that the deviations from a Markov behaviour 
may depend on the algorithm used to construct the merger trees. 

Our goal here is to develop a simple and practical Markov 
algorithm for constructing merger trees, that will be easy to im- 
plement across short time steps, and will provide a good fit to 
the statistics of iV-body merger trees once considered across large 
enough time steps. We will find that this is a doable task once we 
identify the natural variables of time and mass, which permit time- 
invariance and a robust functional shape for the distribution of pro- 
genitors in all halo masses. We aim to demonstrate that this algo- 
rithm provides a substantially better fit to the iV-body trees than 
the EPS-based algorithms. A related analysis is provided indepen- 
dently by Parkinson et al. (2007), based on a different method and 
somewhat different merger trees. 

The outline of this paper is as follows. In §2 we briefly de- 
scribe the Millennium Run and the merger trees used. In §3 we 
extract the main-progenitor history from the simulation, and show 
how we reproduce it with a Markov process. In §4 we present the 
algorithm for constructing full merger trees and demonstrate that 
it is a significant improvement over EPS trees. In §5 we examine 
merger rates and mutual probabilities between progenitors. In §6 
we discuss the limitations of a Markov model. Finally, in §7, we 
summarize our results and discuss them. 



Table 1. The merger trees are divided into 4 bins according to the mass Mo 
of the final halo at z = 0. Each row of the table refers to a different bin, 
defined by M low ^ Mo ^ M high , and consisting of N trees. All masses 
are in units of K^ 1 Mq. 



Average mass 


Mow 


M high 


N 


10 11 


10 11 


1.05 x 10 11 


3 x 10 5 


1.4 x 10 12 


10 12 


2 x 10 12 


2 x 10 s 


2 x 10 13 


10 13 


5 x 10 13 


4 x 10 4 


2.1 x 10 14 


10 14 


10 15 


3 x 10 3 



2 THE MILLENNIUM SIMULATION 

Merger trees are obtained from the Millennium Run iV-body 
simulation (Springel et al. 2005, hereafter MR), carried out 
by the Virgo Consortium. The cosmology is assumed to be 
ACDM, with the cosmological parameters (CI a, os, h) = 
(0.75, 0.25, 0.9, 0.73). The simulation follows the evolution of 
2, 160 3 dark matter particles in a periodic box of a comoving side 
500/i _1 Mpc from 2 = 127 to the present epoch. The particle mass 
is 8.6 x 10 s h~ Mq, and the gravitational force has a comoving 
softening length of 5/t -1 kpc. The particle data were stored at 64 
times, most of which are equally spaced in log(l + 2) between 
2 = 20 and 0. These output snapshots were then used for con- 
structing merger trees. 

We use the merger trees constructed from the MR using an 
FOF algorithm as described in Harker et al. (2006). This algorithm 
is suitable here because it focuses on distinct haloes that are not 
subhaloes of bigger haloes. FOF trees are especially appropriate 
for our purpose because such Af-body trees were compared to EPS 
trees in the past (e.g., Lacey & Cole 1994). In practice, the to- 
tal mass associated with a halo is first estimated using a linking 
length of b = 0.2 compared to the mean near-neighbour distance. 
The mass estimate is then modified slightly in order to properly 
handle substructure, and some haloes are actually split when the 
automatic FOF linking seems unreasonable based on certain cri- 
teria (see Harker et al. 2006, for more details). A non-standard 
procedure in the construction of merger trees is that the search 
for the descendant halo of a given halo is pursued over the sub- 
sequent five snapshots. Such subtle details of the halo finder and 
the tree-construction algorithm may have non-negligible effects on 
the statistics of the merger trees. 

For most purposes we focus on haloes that are identified at the 
present epoch, 2 = 0. We divide these haloes according to their 
final mass Mo at 2 = (the tree "trunk") into four representative 
bins, as listed in table 1. The bins become broader at larger masses 
to ensure a sufficient number of haloes in each bin for good statis- 
tics. 



3 MAIN PROGENITOR HISTORY 

The history of the "main progenitor" (hereafter MP) is constructed 
by following back in time the most massive progenitor in each 
merger event. The mass growth of the MP is interpreted for cer- 
tain purposes as the mass growth history of the final halo, e.g., 
when identifying a characteristic assembly time for the halo. This 
has been useful in quantifying important aspects of the merger his- 
tories of haloes (Lacey & Cole 1993; Wechsler et al. 2002; van 
den Bosch 2002b; Li et al. 2007), and helped in the understanding 
of certain issues concerning galaxy formation (e.g. van den Bosch 
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2002a; Bimboim et al. 2007). We denote by P 1 (M 1 \M , z, z ) the 
conditional probability to have at z a MP of mass Mi, given that 
it has merged by zo into a halo of mass Mo. We will investigate 
below to what extent Pi can be fitted by a unique log-normal dis- 
tribution function, for all masses and at all times. In particular, we 
will study the requirements from the length of the time step for this 
to be a good approximation. This will allow us to construct the full 
statistics of the merger history using a Markov chain model. 

3.1 Natural Variables 

The first key step is to identify a natural time-variable r under 
which the trees are time-invariant. In particular, we wish Pi to 
depend only on At = t(z) — t(zq) and be independent of 
zo- The natural time variable emerging from the EPS theory is 
uo = 8 c (z) /D(z), where 8 c (z) ~ 1.69 with a weak dependence 
on z and D[z) is the cosmological linear growth rate (see ap- 
pendix A). Any time dependence in EPS trees enters only through 
Auj = lo(z) — cu(zo). Indeed, previous analytical derivations of 
MP "formation time" (Lacey & Cole 1993) and the full average 
mass history (Neistein et al. 2006), based on EPS, used uj as the 
time variable. Alternatively, one could try z as the time variable. 
This led van den Bosch (2002b, based on EPS trees) and Wechsler 
et al. (2002) to formulae for the average MP history in good agree- 
ment with earlier iV-body simulations for haloes that are identified 
at zq = 0. Wechsler et al. (2002) also proposed that Az allows a 
good time-invariant generalization to other zo measurement times 1 . 
We next test to what extent these time variables lead to time invari- 
ance of the MP distribution Pi in the Millennium Run. 

Figure 1 shows the average MP history for haloes of mass Mo 
at zo, where Zo is ranging from to 2.4 for each given halo mass. 
These histories are shown as a function of Auj and as a function of 
Az. We see that Auj provides good time invariance, with a scatter 
of less than 10% in the MP mass between different zo values. We 
also see that the use of Az leads to a reasonable time invariance, but 
with a somewhat larger scatter of < 20%, and with a stronger trend 
of increasing scatter at earlier times. We report that we verified a 
similar time invariance for other tree quantities, such as the number 
of progenitors and the mutual probabilities of the two most massive 
progenitors. The above has been verified for the ACDM cosmology 
used in the current simulation. 

It would be interesting to identify the source of residual scat- 
ter in the average MP mass when Zo is varied and the time variable 
Auj is used. This scatter could have potentially been an artifact of 
the redshift dependence of the time steps used in the construction 
of the merger trees. For example, the MP may be the most mas- 
sive progenitor or not depending on the length of the time step. In 
order to test this, we used the zo = haloes to compare Pi at 
Auj ~ 0.4 as produced using different time steps corresponding 
to Auj ranging from 0.016 to 0.4. We find the resultant scatter to 
be negligible. A more relevant source of scatter is the environment 
dependence of halo formation time (Gao et al. 2005; Harker et al. 
2006), detected as a weak correlation between the redshift at which 
(Mi) = 0.5Mo and the environment density for a given Mo. This 
is especially true for haloes of masses Mo <C M*, where M*(z) 

1 One should correct a typo in eq. 5 of Wechsler et al. (2002), where a c 
should be replaced by a c /ao for a c to be the formation time as defined 
there, and independent of ao . Note also that they defined haloes based on 
A (z) within a sphere, while the MR haloes are based on FOF with a con- 
stant b = 0.2. 




1 2 3 1 2 3 

Aco Az 



Figure 1. Time invariance of the average MP history for two different time 
variables, Alu and Az. The three bundles of curves refer to three differ- 
ent halo masses: Mo = 1.2 X 10 12 , 1.5 X 10 13 . 1.6 X 10 14 h~ 1 MQ 
(green, red, blue from top to bottom). The curves in each bundle refer to 
20 = 0, 0.4, 0.7, 1.1, 2.4 (dotted, dashed-dotted, dashed, thin solid, thick 
solid line types, except for the most massive haloes, where the statistics is 
insufficient at zo = 2.4). In accordance with the EPS spirit, the use of Aui 
provides good time invariance, with deviations of less than 10% in (Mi), 
somewhat better than the scatter for Az. The scatter for the less massive 
haloes may be affected by an environment effect, and for high mass haloes 
sampling noise is dominant. 



is the Press-Schechter characteristic mass of nonlinear clustering. 
This may affect the curves in Fig. 1 because the typical environ- 
ment of haloes of a fixed mass is expected to vary with zo. This 
is likely to be a significant source of scatter for the low masses, 
Mo ~ 10 12 /i -1 M0. Poisson noise is important only in the mas- 
sive halo bin, where the number of haloes decrease from ~ 3000 
at Zo = to ~ 300 at zo = 1-1 (note that the mass bins used here 
are not the ones described in table 1). In order to test this noise we 
made 1000 runs of merger trees using our algorithm as described 
below. Each run contained 300 trees, for which the average mass 
was computed, similarly to the MR sample. The standard deviation 
between all theses averages gives a non-negligible error of ~ 4% 
at Alu — 2. This error is comparable to the scatter we see between 
different zo. 

Next we wish to identify a mass variable that would make Pi 
fit by a simple functional form, the same for all masses. An im- 
mediate choice could have been Mi /Mo, but we could not find 
a simple functional form involving this variable that would pro- 
vide a good robust fit to the simulation. Instead, we test ASi = 
S{Mi) - S(M ), where S(M) = a 2 (M) is the variance of the 
initial density fluctuation field, linearly extrapolated to z = 0, and 
smoothed using a window function that corresponds to a mass M. 
This is the natural mass variable used in EPS (Lacey & Cole 1993). 
Note that the natural time variable and this mass variable are re- 
lated via u>(z) — a[M»(z)], which serves as the definition of the 
Press-Schechter mass M*. We describe how we compute S(M) in 
appendix A. 

Figures 2 and 3 focus on the MP distribution 
Pi(AiSi|So, Auj) at several times Auj and for different fixed 
halo masses Mo at zo = 0. The units of Pi are AS 1-1 , so its 
integral over AS equals unity. We see that once the time-step is 
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Figure 2. The probability distribution of the MP mass, Pi(ASi |Srj, Aw). 
Halo masses, Mo at z = 0, correspond to the mass bins listed in table 1. 
The mass difference, AS\ = S(Mi ) — S(Mo), refers to the main progeni- 
tor back at the time corresponding to Aw = 1.9. Shown for each halo mass 
is the distribution deduced from the Millennium Run (filled circles), along 
with the global fit by the log-normal distribution of eqs. (l)-(3) (dashed 
curve). Shown in comparison (solid curve) is the distribution from 10 4 ran- 
dom realizations of merger trees generated by the algorithm described in 
§3.2, using the same distribution of Mq as in the MR. 



sufficiently long, Alu > 0.5, the simulated distribution resembles 
a log-normal distribution in ASi, 

■Hp) 



(InASi 



i\(A5i|S 0) Aw) = — ==exp 

(T p A5l V27T 

The moments are expressed as functions of Mq and Aw, 
a p = (a i log 10 M + a 2 ) log 10 Aco + a 3 log 10 Mo + a.4 



(1) 



(2) 



Hp = (&i log 10 M + b 2 ) log 10 A^ + 6 3 log 10 M + 64 , 

and the best-fit parameters are determined once, globally for all 
halo masses and times as listed in table 1, 

-3 



(01,02,03,04) = (-4.5 x 10 ,-0.34, -0.034, 1.04) 
(61,62,63,64) = (0.072,1.56,-0.22,2.54) , 



(3) 



Figure 3. Similar to Fig. 2, but for the fixed mass bin 1.4 X 10 12 h' 1 M Q 
and different time steps Alu as indicated. Note the different scaling of Pi 
in the different panels. 



for Mo is in units of h~ 1 MQ. We discuss in appendix B the quality 
of the global fit, and demonstrate that it improves with increasing 
time-steps, reaching at Auo > 1 an accuracy of ~ 20% for the first 
four moments of the distribution. Deviations from the global fit are 
apparent in the figure mainly for the lowest-mass haloes, where the 
minimum mass resolution is not negligible. The fits can obviously 
be improved further once the parameters are determined separately 
for each halo mass. 



3.2 A Markov-Chain Model 

We wish to generate random MP histories in a simple way through 
a sequence of equal, small time-steps, Awo, that sum up to the 
desired long time-step Aw. In each time-step i, we draw a ran- 
dom mass-step ASij from a fixed kernel probability function 
Ki(ASi,i\S). The probability Pi(ASi\S , Au) is obtained by 
summing up the small mass-steps ASi,; over all the time-steps. 
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The Markov chain is thus: 



ASxiSo) = A5i,i(S ) + ASi, 2 (S + AS 1 , 1 )+ 



(4) 



+ ASi,„ So + J2 ASl ' ! 



For example, in the case of two steps, 
Pi(ASiiS ,2Ao;o) = 



/fi(AS|S ) Xi(A5i - AS|S + AS)dAS . 



(5) 



A simple solution might have been to use K\ — 
Pi (ASi | So , Aujo) as extracted directly from the Millennium Run. 
However, this procedure fails because the MR trees are not Marko- 
vian for small time-steps, namely Pi also depends on the AS of 
previous time-steps. Consequently, the Pi from the MR is usable 
only for large time-steps, Aljo > 0.5. Such large time-steps are not 
good enough for certain applications which require a higher reso- 
lution in the merger history. In particular, long time steps involve 
multiple mergers, which have to be ordered in time for a proper 
evaluation of the merger rate (see §5). 

Our approach here is to assume a hidden Markov process that 
is valid also for short time-steps. By applying its kernel K\ over a 
sequence of short time-steps, we wish to recover the MP distribu- 
tion in the MR at a big time-step. We should emphasize that K\ is 
not Pi , and is not obtained from the log-normal fit to Pi . Nonethe- 
less, we find that a suitable kernel is also provided by a log-normal 
function, 



K X {AS\S) 



exp 



(InAS-^fc) 



cr fc = 1.367 + 0.012s + 0.234s 2 
Hk = -3.682 + 0.76s - 0.36s 2 



(6) 



where s = log 10 (S). The best-fit parameters of K\ were derived 
using a Monte-Carlo search scheme, optimizing the fit to the sim- 
ulation data (table 1) for Auj > 0.8. Throughout this work, quite 
arbitrarily, we apply A"i with a time-step Aujq = 0.1. We verified 
that any time-step in the range 0.01 < Auj < 0.2 can yield a sim- 
ilar success. However, we failed to match the MR data with time 
steps as small as Auj ~ 10~ , either because of a numerical effect 
or due to a more fundamental issue. 

It should be emphasized that our model kernel K\ guarantees 
that the mass of the main progenitor is monotonically increasing 
with time (namely M is always decreasing with uj), while this is 
not always true in the MR (see the small tail of AS < in Fig. 3). 
This may be important for semi-analytic models of galaxy forma- 
tion, where the recipes for the baryonic processes become more 
complicated when halo mass loss occurs. 

Figures 2 and 3 compare the MP distribution as generated 
from realizations of our Markov model to that deduced from the 
MR at several masses and times. For Auj > 1 and for the mass 
range tested here, the model recovers the data at the level of ~ 20% 
in terms of the first four moments of the distribution (Appendix B). 
For the highest mass bin, the accuracy of the fit at high Aui is ac- 
tually comparable to the simulation sampling noise. At short time- 
steps, Aw < 0.8, the deviations of the model from the data tend to 
be larger. 

While the model predictions of Pi deviate from the iV-body 
data at small time steps, the model manages to reproduce the av- 
erage mass of the MP quite accurately even at small time steps. 
This is demonstrated in Fig. 4, which compares the average mass 




0.5 1.0 1.5 2.0 



Figure 4. Average mass of the MP Mi at redshift z for haloes with mass 
Mo at zo = in the mass bins of table 1. The results from the MR are 
marked by solid circles. The averages from merger-tree realizations as gen- 
erated by our Markov model are the solid curves. The analytic fit of eq. (7) 
gives rise to the dot-dashed curves. The EPS predictions based on the an- 
alytic formula of Neistein et al. (2006) are plotted as dashed lines. The 
Markov model and the analytic fit provide good fits to the data. The analytic 
fit is not as good for the Mo = 10 11 h~ 1 Mq mass bin because eq. (7) 
does not take into account the minimum halo mass of the simulation. The 
EPS fit is not as good. 



of the MP by our Markov model with the data from the MR. The 
fit is excellent for all masses and at all time-steps. The deviations 
are below the ~ 1% level, much less than the scatter due to the 
deviations from time invariance when using Auj. Also shown in 
Fig. 4 are the predictions from the EPS model, as computed by the 
analytic formula proposed by Neistein et al. (2006). Our Markov 
model clearly performs much better than the EPS model. 

The Markov model presented here allows us to construct very 
efficiently many random realizations of the MP history. In particu- 
lar, the transformation from S to M, which is a demanding part of 
the computation, needs to be performed only at a small number of 
output times. Given that the log-normal distribution can be gener- 
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ated very efficiently, we were able to produce MP histories at a rate 
of ~ f 4 per second using a standard ~ 1GHz computer. 

3.3 Average Mass Accretion Histories 

For practical purposes, it is useful to provide a simple fitting for- 
mula that properly approximates the evolution of the average mass 
of the main progenitor in the Millennium Run. We use a functional 
form similar to the one describing the MP in the EPS theory (Neis- 
tein et al. 2006), in which the growth rate is given by 

-^- = -aM 12 , (7) 
and the corresponding mass growth function is 

Mi 2 (Aw|M ) = (M ;f 2 + apAoj)- 1 " 3 , (8) 

where M12 = (Mi)/10 12 /i _1 M , with (Mi) the average mass 
of the MP, and where M ,i 2 = M /10 12 /i _1 M . The best-fit 
parameters are a = 0.59 and f3 = 0.141. 

In order to express the growth rate in terms of time we write, 
dMi/dt = wdMi/dw. Recall that when approximating S c = 
const., Co is given by uj/uj — —D/D. For a better accuracy, we 
offer here a simple explicit approximation for w, 

w = -0.0470 [l + z + 0.1(1 + zy 1 - 25 ] 2 ' 5 h 73 Gyr" 1 , (9) 

where /173 is the Hubble constant measured in units of 73 
kms -1 Mpc -1 . This approximation is valid in ACDM with fl m = 
0.25 and fi A = 0.75 to better than 0.5%. 

The fitting function of eq. (8) is compared to the MR data in 
Fig. 4. The fit is good to better than 3% for the halo mass range 
studied and for Auj < 2.4. Based on the time invariance associ- 
ated with Aw, as discussed in §3.1, one can straightforwardly ex- 
trapolate the fitting formula for the MP history at higher redshifts. 
For example, the early history (z > 2) of the MP of a halo of 
10 14 /i _1 Mq at zo — is similar to the recent history (z > 0) 
of a 10 13 /i _1 M halo, once expressed in terms of Auj. The ac- 
curacy of the fitting formula is limited by the accuracy of the time 
invariance associated with Aoj, Fig. 1. The apparent deviation of 
the simulation data from the fitting formula for the halo mass of 
Mo ~ 10 11 /i _1 Mq and below stems from the minimum halo 
mass imposed in the simulations, M mln = 1.72 x 10 10 h~ 1 M Q . 



4 CONSTRUCTING FULL TREES 

The full merger tree involves much more than the MP history. The 
progenitors in each time step may involve one or more secondary 
progenitors above the minimum mass, and the mutual probabilities 
could in principle be rather complicated. Here, we find based on 
a simple symmetry rule that to a good accuracy all the progeni- 
tors can be drawn from the same kernel distribution function K a , a 
generalization of the K\ used for the MP. 

Recall that the MP kernel distribution K\ predicts the value 
of A Si. The mass of the MP should then be computed by Mi = 
M(So + ASi). The mass available for the additional progenitors 
is Mo — Mi. The simplest approach might be to use K\ again, 
this time in reference to Mo — Mi, in order to obtain AS 2 . The 
mass of the second progenitor can then be computed by M2 = 
M(S a + AS 2 ), where S a = S(M - Mi). This process can be 
repeated in order to draw all the progenitors in each time-step. This 
automatically guarantees that the mass of all progenitors will be 
smaller than M . Surprisingly, this straightforward algorithm gives 



good results for haloes of mass < 10 12 Ii~ 1 Mq. It turns out that 
the accuracy of the results is improved with a little modification, 
defining S a = S(0.95Mq — Mi) for the second progenitor, S a = 
S(0.95M - Mi - M 2 ) for the third progenitor, and so on. The 
results of this simple algorithm practically coincide with the results 
of the algorithm described below at M = 10 12 h~ 1 M Q . 

Encouraged by the success of the simple algorithm for small 
mass haloes, we wish to generalize it to more massive haloes, where 
the number of progenitors per time-step may be larger and the pro- 
genitors may be small. When generating the n'th progenitor in a 
given time-step, define 

ji-i 

Mieft = /Mo -^2 Mi, (10) 
1 

Sieft = S(Mleft) , 

where Mi is the mass of the i'th progenitor and / = 0.967 — 
0.0245 log 10 (So), except for the first progenitor where f — 1. 
The original kernel Ki of eq. (6), with the moments fik and at, 
is replaced by the log-normal function K a (AS\So, SWt), with the 
moments 

Ha = Mfe + (Sieft - S )(2.70 - 4.76s + 2.9s 2 ) , (11) 
oa = °k + (Sieft - So) (0.104 + 0.118s) , 

where s = log 10 (S ). 

The algorithm for constructing a full merger tree, above a min- 
imum halo mass M m i n , is thus as follows: 

(i) Draw a random ASi from the log-normal distributed K a de- 
fined in eq. (11) (note that Si e ft = So gives K\ from eq. 6). 

(ii) Compute the MP mass Mi = M(S + ASi). 

(iii) Compute Mi e ft and Sieft using eq. (10). If Mi e ft turns out 
larger than Mi, have Mj e ft = Mi. In this way, Mi is guaranteed 
to be the most massive progenitor. 

(iv) Draw a random AS 2 using the same K a of eq. (11) and 
compute M 2 = M(Si cft + AS 2 ). 

(v) If M 2 < M m i n , re-generate M 2 by repeating step (iv). 

(vi) Repeat steps (iii)-(v) until Mi e ft is smaller than M m i n . 

The above procedure is very efficient. The code we used for 
constructing the trees is able to produce full trees from z = up 
to z ~ 8 at a rate of ~ 10 5 x M m i n /Mo per second. For example, 
with Mo = 10 14 hr l M Q and M min = 1.72 x 10 10 h^M® as in 
the MR, a typical tree is constructed at 0.02 seconds using our ~ 1 
GHz computer. This tree has a total number of 15,000 progenitors 
on average. 

Figure 5 displays the progenitor mass function AN /AM. The 
quantity plotted is actually AN /A log M times M/Mj, so that each 
equal log interval along the x-axis contributes to the total mass Mo 
in proportion to the corresponding value on the y axis. The results 
from the MR are compared to the results from merger trees that 
were generated using our algorithm. When the time step is not suffi- 
ciently large (z = 0.4), our algorithm shows some deviations from 
the simulation results because of the non-Markov effects in the lat- 
ter are still non-negligible. At higher 2 the fit is better, but not per- 
fect. Deviations as high as ~ 20% can be seen at low z for massive 
haloes and at high z for small haloes. This is significantly better 
than the EPS predictions also shown in Fig. 5, which show devia- 
tions of a factor ~ 2 — 3 in many cases. It is likely that even better 
results can be obtained after a more elaborate tuning of our model 
parameters, though the accuracy is limited by the imperfections in 
the time invariance even when Aoj is used, and the limitations of a 
Markov model discussed in §6. 
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Figure 5. The progenitor number density AN /AM at redshift z for haloes of mass Mo at 20 = 0- The data from the Millennium Run (filled circles) are 
compared to the results from merger trees generated by our algorithm (solid curve). We used 20,000, 3,000 and 600 random trees in the three mass bins, 
respectively. Also shown is the EPS prediction based on eq. (CI) (dashed curve). 



The results of Fig. 5 can be compared to the results reported 
in parallel by Cole et al. (2007), who provide global fitting function 
for AN /AM from different FOF merger trees of the MR. 

Figure 6 shows the average mass encompassed in all the pro- 
genitors above M mlrL as a function of Aui. This is the average sum 
Afaii = Mi, or the integral of the mass function of Fig. 5 — 
a quantity of interest for several applications (Navarro et al. 1997; 
Neistein et al. 2006; Neto et al. 2007). The results from the MR are 
compared to the averages from many realization of merger trees 
generated by our algorithm. An interesting feature of Mali is that 
for small Auj it shows a rather weak dependence on halo mass. 
This implies that during that epoch all haloes gain the same frac- 
tion of their mass via smooth accretion below M m in, despite the 
fact that Afo /M m i n varies. This is related to the fact that the pro- 
genitor mass function AN /AM has a similar tail at low masses for 
all halo masses. The last point can be seen in fig. 5, as histograms 
of different Mo but with the same 2 are all similar at the low mass 
end. 

The algorithm presented above has been empirically tuned to 
reproduce the distribution of the MP mass Pi and the total mass 
function AN/AM, at big enough time-steps (Aui > 0.8). Lack- 
ing an obvious physical motivation, it is not guaranteed a priori 
to also recover the correct full joint distribution of progenitors. 



Nevertheless, we find that the algorithm manages to reproduce 
the second progenitor with adequate accuracy over a large range 
of halo masses. This is demonstrated in the next section, where 
the second-progenitor distribution is recovered quite accurately for 
10 13 /i _1 M© haloes. The algorithm may be less accurate for very 
small progenitors, M < O.OlMo, but these progenitors encompass 
only a small fraction of the mass at Aui = 0.1, a few percents for 
M = 10 13 H^Mq, 



The algorithm presented here can be compared to the one by 
Sheth & Lemson (1999), motivated by Poisson initial conditions. 
These authors have developed an algorithm that is based on the 
notion that mutually disconnected volumes inside a halo are mu- 
tually independent. As a result, all the progenitors are drawn from 
the same probability distribution depending on the density in each 
region. In our algorithm it is the mass steps ASi that are almost 
independent, although the progenitor masses depend on each other 
through Afi c ft. Our current study focuses on providing a recipe that 
reproduces the TV-body simulation data, but the symmetry that lies 
at the basis of our successful algorithm may provide interesting 
clues that may lead to a more physical model. 
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Figure 6. The average mass of all progenitors, M a n, as a function of time 
step, Aui, for haloes of different masses as listed in Table 1 . The data from 
the simulation are marked by squares, triangles and circles for Mo = 1.4 X 
10 12 , 2 X 10 13 and 2.1 X 10 14 respectively. The corresponding model 
predictions are marked by green dashed, red dot-dashed and blue solid lines 
respectively. 



5 MERGER RATES 

It is often very useful to extract from merger trees the merger rates 
of haloes of different masses. This is a key ingredient in galaxy- 
formation models, where major merger are assumed to be an impor- 
tant channel for the formation of star bursts, spheroidal stellar sys- 
tems and AGNs. The progenitor mass function dTV/dM addressed 
above is clearly not enough to constrain the merger rates (e.g. Sheth 

6 Lemson 1999). 

In the simulation, and in our Markov model, there are cases 
were a halo has many progenitors per time-step, especially when 
the halo is massive or when the time step is large. Because the or- 
der by which progenitors merge may change the results, a complete 
self-consistent treatment of merger rates should properly address 
all possible merger sequences within a time-step. Here we limit our 
analysis to the joint probability of the two most massive progeni- 
tors, Pi, 2 (Mi, M 2 |M , Aw), with Mi 5? M 2 . In fact, we define 
here the merger-rate kernel to be similar to Pi, 2, but with the addi- 
tional simplifying constraint that no other mergers occur during the 
time-step Auj. 

This approximation may admittedly be somewhat crude. On 
one hand, we learn from the MR simulation that for 10 13 h _1 MQ 
haloes and Aui = 0.1 about ~ 90% of the merger events with 
Mi /Mi > 0.05 involve only Mi and M 2 . On the other hand, the 
residual mass in all other progenitors is on average about one third 
of M2, i.e., not negligible. Had we merged these small progenitors 
with M2 prior to its merger with Mi, the change in AT 2 would have 
induced a non-negligible change in the quoted merger rate for Mi 
and M 2 . Our approximation for this merger rate becomes better 
if the smaller progenitors merge first with the much larger Mi, or 
merge after the Mi-M 2 merger altogether. 

Figure 7 compares Pi, 2 from 10 4 merger trees generated by 
our algorithm with MR trees for a halo of mass 2 x 10 13 Ii~ 1 Mq 
and for Auj = 0.1 and 1.7. Our algorithm nicely fits the simula- 
tion at big Aui. The fit is only qualitative at the small time step, 
Alu = 0.1. We know already that deviations along the Mi axis are 
expected due to the non-Markov behaviour of the MP (§3.2), and 
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Figure 7. Joint distribution of the two most massive progenitors Pi , 2 . Each 
panel shows two snapshots in time, at Au> = 0.1 and 1.7. The contour 
levels are at Pi, 2 = 5, 10, 30 Mg 2 . The results from the MR for the mass 
bin of 2 X 10 13 /i _1 Mg are shown as dashed blue contours. Upper panel: 
The thick solid red contours refer to realizations of merger trees generated 
by our algorithm. The thin green contours are the analytic approximation of 
eq. (12). Lower panel: The solid red contours refer to realizations of EPS 
merger trees using the algorithm of Somerville & Kolatt (1999). 



we will see below (§6) that the deviations along the M2 axis are 
also unavoidable. 

The lower panel of Fig. 7 shows results from EPS merger trees 
constructed using the standard algorithm of Somerville & Kolatt 
(1999). We see that this algorithm underestimates the mass of the 
second progenitor at all time steps. This discrepancy was not ob- 
vious in Somerville et al. (2000, Fig. 9), probably because of the 
rather small ratio of Mo/M m i n ~ 40 used there (in comparison 
with ~ 1000 here). Different algorithms based on EPS may yield 
different results, and our preliminary study indicates that it would 
be possible to develop an EPS algorithm in a spirit similar to our 
current model such that its Pi 2 will provide a better fit to the TV- 
body results. 

Our results could be compared to the estimate by Lacey & 
Cole (1993) for mergers in the limit of infinitesimal time steps 
within the framework of EPS. They assumed that in this limit 
merger events are binary, so M2 is fully determined by Mi and 
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Mo - The assumption of binary mergers in small time-steps can be 
tested in Fig. 7, where the distribution of Pi,2 for Auj = 0.1 is 
clearly peaked near the line Mi +M 2 = Mo . This may explain why 
Lacey & Cole (1994) found a good match between their EPS for- 
mula and results from TV-body simulations. However, this approach 
is not fully consistent. Using infinitesimal time steps Pi, 2 actually 
converges to a Dirac delta function about (Mi, M2) = (Mo, 0), 
and therefore cannot be used to predict Pi, 2 at finite time steps. 
Moreover, we show in Appendix C that in the limit of small time 
steps the EPS formalism does not converge to binary mergers. This 
may explain why the formula of Lacey & Cole (1993) fails to yield 
the correct symmetry between the two merging progenitors (Ben- 
son et al. 2005). 

Our algorithm can be expressed in terms of an analytic esti- 
mate for Pi, 2- We assume that the second progenitor drawn is also 
the second most massive. In this case: 



Pi, 2 (Mi,M 2 |M , Auj =0.1) = 

K a (ASi\So, So) ■ Ka{AS 2 \So, S lcft ) 



(12) 



dS(Mi) dS(M 2 ) 



AM AM ' 

where A Si = S(Mi) - S(M ), AS 2 = S(M 2 ) - Si eft , Si eft 
is defined by eq. (10) with n = 2, and K a is given by eq. (11). 
This approximation is shown for the short time step in the upper 
panel of Fig. 7. We see that the analytic expression provides a crude 
approximation for the results from the merger trees constructed by 
the full algorithm and the MR simulation for M 2 /M > 0.05. The 
analytic approximation apparently fails at lower values of M 2 . This 
is because the second progenitor drawn is no longer necessarily the 
second most massive. 

Our approximate formula for the merger rate Pi, 2, eq. (12), 
is time-invariant; it holds for any measurement redshift zo where 
Mo is identified. Its change with time becomes apparent only when 
the rate is expressed with respect to a unit of time rather than oj, 
i.e., the merger rate is oc w. In the ACDM cosmology used here 
(1) oc (1 + z) m where m varies from ~ 2.2 at low redshift to an 
asymptotic value of 2.5 at high redshift (see eq. 9). Early studies of 
merger rates in TV-body simulations found somewhat higher values 
in the range 2.5 < m < 3.5 (e.g. Governato et al. 1999; Gottlober 
et al. 2001). It is not clear at this point how accurate these TV-body 
estimates are. If future measurements of TV-body merger rates in- 
deed turn out different from our time-invariant predictions, one can 
think of several potential reasons for such deviations. First is the 
non-Markov nature of TV-body merger rates at small time steps. If, 
for instance, it is due to the finite relaxation time after a merger, and 
if this time is associated with the halo dynamical time that varies 
with redshift, then the non-Markov effects may vary with redshift. 
Second is the imperfection of the time invariance when using Auj. 
Thirdly, the deviation may arise from the differences between Pi, 2 
and the actual merger rate, where multiple mergers are not negligi- 
ble. 



6 MARKOV AND NON-MARKOV PHASES 

Despite the fact that the TV-body trees are not Markovian at small 
time-steps, we saw that our Markov algorithm manages to repro- 
duce many of the tree properties across big time-steps, including 
the MP distribution, the progenitor mass function, the merger rates 
and the total mass in all the progenitors. We also saw some inaccu- 
racies of the Markov model in reproducing the tree properties and 
merger rates. In particular, Fig. 7 indicates that while the average 
mass of the main progenitor is reproduced quite accurately, the av- 
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Figure 8. Average MP history for haloes of a given mass identified at 
zo = 0.4, grouped according to their future between 2 = 0.4 and 2 = 0. 
The three solid blue curves refer to haloes of mass 1.4 X 10 12 /i _1 -Mq. 
Shown is the evolution of all the haloes (thick line), those that end up 
as 2 X 10 13 haloes at 2 = (medium line), and those that end up as 
2.1 X 10 14 h~ 1 MQ at 2 = (thin line). The two dashed red curves 
are for haloes of 1.5 X 10 13 h _1 MQ at 20 = 0.4, where the thick 
line is all haloes, and the thin line is the average of haloes that end up as 
2.1 X 10 14 h~ 1 MQ haloes at 2 = 0. All progenitors show the familial' 
growth at a uniform rate at early times, while those that end up as massive 
haloes at 2 = show mass loss during the last Aoj ~ 0.1 before 20 = 0.4. 
The point were the slope of each curve stalls deviating from the slope at 
high redshift is marked by an arrow. 



erage mass of the second progenitor is inaccurate. Here we address 
additional non-Markov aspects of the small-progenitor behaviour. 

In Fig. 8 we show average MP histories of haloes of a given 
mass as identified at zo = 0.4, grouped according to their future 
evolution from z — 0.4 to z = 0. We see that on average those 
haloes that will end up as part of a more massive halo at z = 
have already suffered an abnormally slow growth starting Auj ~ 
0.2 or more before zq = 0.4. This actually turns (on average!) 
into a period of mass loss during the last Auj ~ 0.1 just before 
zo- This is clearly a non-Markov behaviour. It is probably due to 
tide-limited accretion at the vicinity of massive haloes and tidal 
stripping once passing inside such haloes (Diemand et al. 2007; 
Desjacques & Dekel 2007; Hahn et al. 2007). At early times, e.g., 
more than Auj ~ 0.5 prior to zo, the growth rate of these special 
haloes is similar to the average of all the haloes, but the value of Mi 
at any given time is ~ 1.5 times higher. The transition from average 
growth rate to a suppressed growth rate can be identified, as marked 
by the arrows in Fig. 8. We could interpret this as transition from a 
Markov to non-Markov behaviour. 

The non-Markov effects limit the accuracy of our model. In 
particular, the progenitor mass function of Fig. 5 do not approach 
a Markov behaviour even at high redshift, in the sense that the suc- 
cess of the Markov model in one time step does not guarantee its 
success in other time steps. This is because many of the progen- 
itors present at a given redshift are about to merge into a much 
bigger halo a short time later, and are therefore subject to mass loss 
that induces a non-Markov behaviour. This explains why our model 
fits for AN /AM show non-negligible deviations from the simulated 
mass functions. 

The fact that the mass of some haloes is not monotonically 
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increasing with time is doomed to complicate the interpretation of 
the merger rates, even if the mergers are counted properly in a given 
time-step in the simulation. It is not obvious how to formulate a 
self-consistent and time-invariant recipe for merger rates given that 
these haloes were actually more massive at some point in the past. 
Nevertheless, despite the shortcomings of any Markov model, it 
provides a sensible basis for a self-consistent definition of merger 
rates. For one thing, the merger rates of a non-Markov model are 
likely to have an undesired dependence on the length of the time 
step chosen for the tree. Our Markov model tends to overestimate 
the mass of the secondary progenitors at small time-steps (seen in 
Fig. 7 as a stretching of the model contours toward higher values of 
M2), thus approximately compensating for the opposite effect due 
to mass loss. This is an outcome of the model tuning, designed to 
fit the data of AN /AM at large time steps. 

The non-Markov effect seen in Fig. 8 is related to the envi- 
ronment dependence of the assembly time for distinct haloes (Gao 
et al. 2005; Harker et al. 2006) through the natural correlation be- 
tween the future halo mass and its current environment. Figure 8 
demonstrates why the formation redshift of a halo that resides in 
a high-density environment is higher than average. With higher 
mass loss prior to zo, the "formation time", when the MP was 
Mi = 0.5 Mo, is clearly earlier. This is quantified in Hahn et 
al. (2007) and Desjacques & Dekel (2007). We see that the envi- 
ronment effect is at least partly associated with haloes in their non- 
Markov phase where they are about to merge into bigger haloes. We 
may therefore expect a weaker or no environment effect for haloes 
in their Markov phase of typical monotonic growth, before the tran- 
sition marked by an arrow in Fig. 8. This division into haloes in the 
Markov phase versus those in the non-Markov phase might be a 
natural way to divide the halo population, more physical than the 
standard division to "distinct haloes" versus "subhaloes" based on 
the virial radius. 

Simulations and observational data indicate that, unlike dark 
haloes, the stellar galaxies that reside in them tend not to show 
a significant environment dependence (Croton et al. 2007; Tinker 
et al. 2007). While, as seen in Fig. 8, the mass-loss induced envi- 
ronment effect occurs mainly at late times, the stellar systems might 
have crystalized as compact systems at earlier times, which makes 
them less subject to tidal effects. 



7 DISCUSSION 

We addressed the statistics of dark-matter merger trees, as extracted 
from the Millennium iV-body simulation. We demonstrated that the 
time and mass variables of the EPS formalism, cu(t) and a 2 (M), 
are indeed the natural variables for describing merger trees in a 
time-invariant way, at an accuracy level of a few percent. It may be 
interesting to explore different ways to define haloes in an attempt 
to improve the time invariance of the statistics. This includes, for 
example, different systematic time variations of the overdensity or 
linking length used to define the haloes. It may also be worthwhile 
to test the time invariance in an idealized Einstein-deSitter cosmol- 
ogy with a power-law power spectrum, where there could be a bet- 
ter chance to isolate the non-Markov contribution to any violation 
of self-similarity. 

The log-normal nature of the distribution of the main progen- 
itor as a function of a 2 (M) may be a clue to the physical origin 
of the statistics of merger trees. It may be associated with a prod- 
uct of multiple random processes through the central-limit theorem, 
but this is beyond the scope of our current analysis. It may be in- 



teresting to evaluate to what extent this log-normal behaviour is 
valid in different cosmological models, which could be interpreted 
as representing different density environments in a given ACDM 
cosmology. It should also be interesting to test the changes induced 
by using a different window function in the definition of a 2 (M). 

Despite the non-Markov nature of iV-body trees, we showed 
that they can be approximated by a Markov process of short time- 
steps that reproduces the progenitor distribution at sufficiently long 
time-steps, Au> > 0.5. The average main-progenitor history is ac- 
tually recovered accurately even at short time steps. In addition, 
the distribution of full iV-body merger trees can be reproduced by 
a similar probability distribution function for all the progenitors. 
The progenitors are drawn one after the other from the mass left in 
the descendant halo after subtracting the progenitors chosen so far. 
We demonstrated that the joint distribution of the two most mas- 
sive progenitors is reproduced quite accurately. This algorithm can 
thus be used to construct semi-analytic merger trees that resemble 
the statistics of iV-body merger trees better than any previous al- 
gorithm. It is in principle applicable at any desired mass resolution 
and in any cosmological model. However, the non-Markov features 
of the merger trees limit the accuracy. Preliminary tests indicate that 
a similar model can possibly be developed for merger trees based 
on the EPS formalism. 

Extracting merger rates from the simulation is a non-trivial 
task. First, with several progenitors in each time-step, the order by 
which they merge matters for the merger rates and should be prop- 
erly modeled. Second, the non-Markov suppression of growth rate, 
e.g., due to tidal effects makes the progenitor mass just prior to a 
merger differ from the masses as extrapolated from the same pro- 
genitors at high redshift. Still, we deduce from our Markov algo- 
rithm a simple approximation to the merger rate kernel for the two 
most massive progenitors. Once applied over short time-steps, it re- 
produces the high- z progenitor mass with good accuracy. The time 
invariance of our algorithm implies that the merger rates evolve in 
time in proportion to uj ~ (1 + z) m , where m ranges from ~ 2.2 
at low 2 to 2.5 at high z. This time invariance may be invalidated 
by non-Markov effects that evolve with time, such as the dynamical 
time of haloes. For all the reasons above, the success of our approx- 
imate merger rates in reproducing the actual iV-body merger rates 
is yet to be evaluated. 

Our algorithm suggests a natural distinction between Markov 
and non-Markov haloes, or phases in the evolution of a halo. The 
Markov phase is when the halo grows monotonically in time in a 
rate close to the average rate, before it is suppressed, presumably 
by tidal effects in the neighborhood of massive haloes. The popu- 
lation of Markov haloes should not show the environment depen- 
dence of halo formation time (Gao et al. 2005; Harker et al. 2006). 
This distinction would rely on non-local halo properties, such as 
its proximity to more massive structures. A practical definition of 
non-Markov haloes may be those that will become subhaloes of a 
bigger halo in the next time interval corresponding to Auj ~ 0.5. 
However, this particular tentative definition has the undesired effect 
of smoothing the time resolution of the tree. Working out a similar 
distinction without suppressing the tree resolution is an interesting 
challenge for future work. 

MATLAB and C codes of the algorithm pre- 
sented in this paper are available on the web at 
http://www.phys.huji.ac.il/~eyal_n/merger_tree/ and can be 
used as a black box for constructing merger trees. 
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APPENDIX A: COMPUTING u> AND S 

In this section we describe in detail how we compute the natural 
variables, uj(z) and S(M). The cosmological parameters in the MR 
are (Ov, O m , ct 8 , /i) = (0.75, 0.25, 0.9, 0.73). We use the stan- 
dard power spectrum P(k) — kT 2 (k), with the transfer function 
(Bardeen et al. 1986) 



T(k) 



ln(l + 2.34g) 
2.34g 



(Al) 



[1 + 3.89q + (16.1g) 2 + (5.46g) 3 + (6.71g) 4 ] 



-1/4 



Here q = k/V, with k in /iMpc -1 , and T — 0.169 is the power 
spectrum shape parameter chosen to best fit the CMBFAST model 
(Seljak & Zaldarriaga 1996) used in the MR. 

We use the definition of S(M) from Lacey & Cole (1993) as 
the variance of the density field smoothed with a spherical top-hat 
window function of a radius that on average encompasses a mass 
M in real space. In practice we use the fitting function given by van 
den Bosch (2002b): 



S(M) = u 



c r 



1/3 



M 



1/3 



-I 



u 2 (32r) ' 

where Co = 3.804 x 10~ 4 , and u{x) is an analytical function: 

0.3 



(A2) 



u(x) = 64.087 [l + 1.074x° 



(A3) 



-1.581a; - 4 + 0.954a; - 5 



0.185a;' 



().6j 



-10 



In order to compute cu(z) we use the recipe from Navarro et al. 
(1997), which uses for the ACDM cosmology: 



oj = 1.6865 
where 



D(z) 



Sl m {l + zf 



(A4) 



(A5) 



On(i + zf + (i - n m - O0(i + z) 2 + Ov ' 

The linear growth rate D(z) is computed by performing the inte 
gral: 



/°° 1+2 



(A6) 
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where Do is a constant set by the normalization D(0) = 1. We 
provide a practical approximation for u>(z), 



i(z) = 1.260 [1 + z + 0.09(1 + z)' 1 + 0.24e" 



(A7) 



which is accurate to better than 0.5% at all redshifts for the ACDM 
cosmology used here. As mentioned in section 3.3, the time deriva- 
tive of uj can be well approximated by: 

-1.25] 2 - 5 



= -0.0470 1 + £ + 0.1(1 +«)" 



h n Gyr" 1 , (A8) 



where /173 is the Hubble constant measured in units of 73 
kms -1 Mpc -1 . This is also good to better than 0.5% at all red- 
shifts. 



APPENDIX B: GOODNESS OF FIT FOR THE MAIN 
PROGENITOR 

In this Appendix we evaluate the quality of fit of the two mod- 
els presented in §3 to the distribution of main-progenitor mass 
in the MR simulation. First the straight-forward global fit for 
Pi(ASi|So, Aw), eqs. (l)-(3) is examined. The quality of this fit 
to the MR data is evaluated in Fig. B 1 via the fractional deviations 
in the first four moments of Pi , the mean, standard deviation, skew- 
ness and kurtosis. The fit is reasonably good starting at Auj ~ 0.8, 
with deviations of ~ 20% in all moments, and with the skewness 
showing somewhat larger deviations. One reason for these devia- 
tions is the global nature of the fit, being performed once for all 
masses and times. Naturally, separate fits in limited mass ranges 
or epochs will yield better results. Another source of scatter is the 
limited sampling, which contributes an error of ~ 0.5% in the most 
massive bin. For this bin, the sampling error is comparable to the 
deviations of the model average from the data. 

Also shown in Figure Bl is the difference between the MP 
distribution as derived from 10 s merger-tree realizations of our 
Markov model with the kernel K\ and the distribution in the MR 
simulation. Results of the same Markov model are also displayed in 
Fig. 3, where a specific halo mass is followed in time. The sources 
of scatter discussed above are also valid here. Additional scatter 
arises from the differences between the time-steps of our model 
and those in the simulation. As our model uses a fixed kernel with 
Awo = 0.1, we generate predictions only at times which are inte- 
ger multiples of Aujq = 0.1. We pick the closest possible output 
times from the MR, but this is only good to 10% in Auj/uj at low 
Alu, and 0.5% at high Auj. This source of scatter can be weakened 
by interpolation between time steps. 
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Figure Bl. Goodness of fit for the mass distribution of the main progenitor 
Pi (ASi | So, Auj). The two models of §3 are compared to the MR simu- 
lation. For each Mq and Au> we show the fractional deviation in the first 
four moments of the distribution. These moments are the mean, standard 
deviation, skewness, and kurtosis. The solid curves refer to the merger tree 
realizations using K\ versus the simulation. The dashed curves refer to the 
global log-normal fit of eqs. (l)-(3) in comparison with the simulation. The 
halo masses are in the three massive bins defined in table 1 , with the thick- 
ness of the line increasing with halo mass. 



APPENDIX C: BINARY MERGERS IN EPS? 

We define a "binary merger" event by having exactly Mi + M2 = 
Mo in a given time step. We show here that this is not a valid limit in 
the EPS formalism when the time step is infinitesimal. The number 
density of progenitors as predicted by EPS is (e.g., Lacey & Cole 
1993) 

diV 
dM 



(M,z\M ,z ) dM 

M 1 Auj 
~M AtpT* 



exp 



Auj 2 



2AS 



dS 



dM 



dM. 



(CI) 



When the progenitors of all masses down to M — *■ are consid- 
ered, this implies that any halo has an infinite number of progen- 
itors at any previous time, not permitting a binary event even at 



small time-steps. Only when a minimum halo mass M m i n is im- 
posed can a binary merger occur. However, we show below that the 
mass in "progenitors" below M m in, which one may term "smooth 
accretion", M aC c, is never negligible compared to M2. 

The average M acc is obtained by integrating dN/dM x M 
between and M m i n , 



(M acc ) 



erf 



Auj 



2S 



(C2) 



Mo 

where SWim = S(M m i n ) and So = 5(Mo). It has been shown in 
Neistein et al. (2006) that the main-progenitor distribution at small 
time-steps equals dN/dM for M > Mo/2, with a small tail ex- 
tending to low masses M < Mo /2. Consequently, the probability 
of the second progenitor, P2, roughly equals dN/dM in the range 
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Af m i n < M < Mo /2. Since the latter is always a slight overesti- 
mate, we use it for an upper limit to the average mass of the second 
progenitor. Integrating AN /AM x M we obtain 



(M 2 ) 



< erf 



Au 



\/2S : 2 — 2So _ 



erf 



Au 



Mo 

where S 2 = S(M /2). 

In the limit of small time-steps, Aw 

2x/^/n as x — * 0, we get 



! — 25o . 
0, using erf (x) 



(C3) 



(C4) 



(M 2 ) / Smin-So 

(M acc ) "V S 2 -5 

For the ACDM cosmology used here, and with the minimum mass 
of 1.72 x 10 10 H~ 1 Mq in the Millennium simulation, this up- 
per limit varies between 2 and 6.5 for haloes of mass 10 12 to 
10 14 h~ 1 M Q . The actual value of (M 2 )/(M acc ) is somewhat 
lower, and it may depend on the specific algorithm used to con- 
struct the trees. 

One may argue that in eq. (C4) we can take Smin to in- 
finity as M m i n goes to zero, so (M2)/(M acc ) will approach in- 
finity as well. Apparently, this procedure may seem to eliminate 
the minimum mass and make the accreting mass vanish such that 
the limit of binary mergers is reproduced. We want to emphasize 
that this limit is not well defined in EPS. It can be shown that in 
the limit S oc AS — > oo and Au — > eq. (CI) approaches 
AwM _1 S _1 AS. This expression actually depends on the way 
by which each variable approaches its limit, so the procedure can 
practically yield an arbitrary result. 

Thus, the accretion mass is always comparable to Mi, even 
when they both vanish linearly with Auj. This implies that the 
merger rate as computed by Lacey & Cole (1993) (their eq.2. 17) 
is not self-consistent within the EPS formalism and may there- 
fore be invalid. It may explain why Benson et al. (2005) found this 
merger rate problematic. The situation is different in merger trees 
constructed from Af-body simulations, where every given halo has 
a finite number of particles, thus introducing a natural M m i n at the 
particle mass. In this case, binary mergers occur in the limit of small 
time-steps, as there is no smooth-accretion component. 



